An analysis of a policing dataset from Dallas, Texas in 2016

Introduction

The goal of this task is to provide an analysis of a policing dataset from Dallas, Texas in 2016. This task is important as it will help us identify patterns and relationships between variables thus we can understand how they impact each other. When dealing with police data, analyzing trends can help identify patterns and relationships between variables such as crime types, locations, and time of day. This can provide insights into where and when crimes are most likely to occur, which can help police departments to allocate resources more effectively. Analyzing trends can also help predict future crime trends and patterns, which can be used to prevent crime and improve public safety. Identifying an increase in a certain type of crime would be a signal to the police departments to increase their focus and efforts on those areas to prevent further incidents from occurring. Identifying outliers in policing data can also be crucial. Outliers in crime data could be indicators of unique crime events that do not occur often. These events may require special attention and thus this information is important to police departments as they may need to be prepared in the event that they take place. Data is collected and analysed so that it can be used to make informed decisions. The analysis of a data set helps people identify trends, and patterns that can improve decision-making. Areas that need improvement, resources that need to be allocated and change that need to be made can be informed through analysis activities. Data-driven decisions are always informed decisions. This is particularly important for a police department as they can use the information gained to prevent crime and improve public safety.

Initial Exploration of the data set

Conducting an initial exploration of a given data set is a crucial step. This task helps the analyst gain a better understanding of the data and the variables in the dataset. This understanding is what helps in developing hypotheses and questions that guide further analysis. Exploring a data set helps one to identify errors, inconsistencies, or missing values in the data. Dealing with errors and discrepancies can help ensure that the data is accurate and reliable. This is important as errors can affect the results of the analysis. Exploring the data set acts as a guide on what data preprocessing or cleaning tasks need to be done. Preprocessing tasks include transforming variables to their right form and imputing missing values where necessary.

The first task involves introducing the data set into the environment. Creating a copy of a data frame preserves the original data collected. It also provides a level of safety and flexibility when working with data, and can help ensure that the data remains intact and accurate throughout the analysis.

data<-read.csv("37-00049_UOF-P_2016_prepped.csv")
df<- read.csv("37-00049_UOF-P_2016_prepped.csv") #Create a copy that will have no modifications

The data set in use contains 2384 observations of 47 variables. By looking at the data frame, it can be seen that all the variables have been classified as characters. Even those that appear to have numerical inputs

str(data)#gives some properties of the variables
## 'data.frame':    2384 obs. of  47 variables:
##  $ INCIDENT_DATE                               : chr  "OCCURRED_D" "9/3/16" "3/22/16" "5/22/16" ...
##  $ INCIDENT_TIME                               : chr  "OCCURRED_T" "4:14:00 AM" "11:00:00 PM" "1:29:00 PM" ...
##  $ UOF_NUMBER                                  : chr  "UOFNum" "37702" "33413" "34567" ...
##  $ OFFICER_ID                                  : chr  "CURRENT_BA" "10810" "7706" "11014" ...
##  $ OFFICER_GENDER                              : chr  "OffSex" "Male" "Male" "Male" ...
##  $ OFFICER_RACE                                : chr  "OffRace" "Black" "White" "Black" ...
##  $ OFFICER_HIRE_DATE                           : chr  "HIRE_DT" "5/7/14" "1/8/99" "5/20/15" ...
##  $ OFFICER_YEARS_ON_FORCE                      : chr  "INCIDENT_DATE_LESS_" "2" "17" "1" ...
##  $ OFFICER_INJURY                              : chr  "OFF_INJURE" "No" "Yes" "No" ...
##  $ OFFICER_INJURY_TYPE                         : chr  "OFF_INJURE_DESC" "No injuries noted or visible" "Sprain/Strain" "No injuries noted or visible" ...
##  $ OFFICER_HOSPITALIZATION                     : chr  "OFF_HOSPIT" "No" "Yes" "No" ...
##  $ SUBJECT_ID                                  : chr  "CitNum" "46424" "44324" "45126" ...
##  $ SUBJECT_RACE                                : chr  "CitRace" "Black" "Hispanic" "Hispanic" ...
##  $ SUBJECT_GENDER                              : chr  "CitSex" "Female" "Male" "Male" ...
##  $ SUBJECT_INJURY                              : chr  "CIT_INJURE" "Yes" "No" "No" ...
##  $ SUBJECT_INJURY_TYPE                         : chr  "SUBJ_INJURE_DESC" "Non-Visible Injury/Pain" "No injuries noted or visible" "No injuries noted or visible" ...
##  $ SUBJECT_WAS_ARRESTED                        : chr  "CIT_ARREST" "Yes" "Yes" "Yes" ...
##  $ SUBJECT_DESCRIPTION                         : chr  "CIT_INFL_A" "Mentally unstable" "Mentally unstable" "Unknown" ...
##  $ SUBJECT_OFFENSE                             : chr  "CitChargeT" "APOWW" "APOWW" "APOWW" ...
##  $ REPORTING_AREA                              : chr  "RA" "2062" "1197" "4153" ...
##  $ BEAT                                        : chr  "BEAT" "134" "237" "432" ...
##  $ SECTOR                                      : chr  "SECTOR" "130" "230" "430" ...
##  $ DIVISION                                    : chr  "DIVISION" "CENTRAL" "NORTHEAST" "SOUTHWEST" ...
##  $ LOCATION_DISTRICT                           : chr  "DIST_NAME" "D14" "D9" "D6" ...
##  $ STREET_NUMBER                               : chr  "STREET_N" "211" "7647" "716" ...
##  $ STREET_NAME                                 : chr  "STREET" "Ervay" "Ferguson" "bimebella dr" ...
##  $ STREET_DIRECTION                            : chr  "street_g" "N" "NULL" "NULL" ...
##  $ STREET_TYPE                                 : chr  "street_t" "St." "Rd." "Ln." ...
##  $ LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION: chr  "Street Address" "211 N ERVAY ST" "7647 FERGUSON RD" "716 BIMEBELLA LN" ...
##  $ LOCATION_CITY                               : chr  "City" "Dallas" "Dallas" "Dallas" ...
##  $ LOCATION_STATE                              : chr  "State" "TX" "TX" "TX" ...
##  $ LOCATION_LATITUDE                           : chr  "Latitude" "32.782205" "32.798978" "32.73971" ...
##  $ LOCATION_LONGITUDE                          : chr  "Longitude" "-96.797461" "-96.717493" "-96.92519" ...
##  $ INCIDENT_REASON                             : chr  "SERVICE_TY" "Arrest" "Arrest" "Arrest" ...
##  $ REASON_FOR_FORCE                            : chr  "UOF_REASON" "Arrest" "Arrest" "Arrest" ...
##  $ TYPE_OF_FORCE_USED1                         : chr  "ForceType1" "Hand/Arm/Elbow Strike" "Joint Locks" "Take Down - Group" ...
##  $ TYPE_OF_FORCE_USED2                         : chr  "ForceType2" "" "" "" ...
##  $ TYPE_OF_FORCE_USED3                         : chr  "ForceType3" "" "" "" ...
##  $ TYPE_OF_FORCE_USED4                         : chr  "ForceType4" "" "" "" ...
##  $ TYPE_OF_FORCE_USED5                         : chr  "ForceType5" "" "" "" ...
##  $ TYPE_OF_FORCE_USED6                         : chr  "ForceType6" "" "" "" ...
##  $ TYPE_OF_FORCE_USED7                         : chr  "ForceType7" "" "" "" ...
##  $ TYPE_OF_FORCE_USED8                         : chr  "ForceType8" "" "" "" ...
##  $ TYPE_OF_FORCE_USED9                         : chr  "ForceType9" "" "" "" ...
##  $ TYPE_OF_FORCE_USED10                        : chr  "ForceType10" "" "" "" ...
##  $ NUMBER_EC_CYCLES                            : chr  "Cycles_Num" "NULL" "NULL" "NULL" ...
##  $ FORCE_EFFECTIVE                             : chr  "ForceEffec" " Yes" " Yes" " Yes" ...
dim(data)#gives dimensions of the data frame 
## [1] 2384   47
names(data)# gives names of variables
##  [1] "INCIDENT_DATE"                               
##  [2] "INCIDENT_TIME"                               
##  [3] "UOF_NUMBER"                                  
##  [4] "OFFICER_ID"                                  
##  [5] "OFFICER_GENDER"                              
##  [6] "OFFICER_RACE"                                
##  [7] "OFFICER_HIRE_DATE"                           
##  [8] "OFFICER_YEARS_ON_FORCE"                      
##  [9] "OFFICER_INJURY"                              
## [10] "OFFICER_INJURY_TYPE"                         
## [11] "OFFICER_HOSPITALIZATION"                     
## [12] "SUBJECT_ID"                                  
## [13] "SUBJECT_RACE"                                
## [14] "SUBJECT_GENDER"                              
## [15] "SUBJECT_INJURY"                              
## [16] "SUBJECT_INJURY_TYPE"                         
## [17] "SUBJECT_WAS_ARRESTED"                        
## [18] "SUBJECT_DESCRIPTION"                         
## [19] "SUBJECT_OFFENSE"                             
## [20] "REPORTING_AREA"                              
## [21] "BEAT"                                        
## [22] "SECTOR"                                      
## [23] "DIVISION"                                    
## [24] "LOCATION_DISTRICT"                           
## [25] "STREET_NUMBER"                               
## [26] "STREET_NAME"                                 
## [27] "STREET_DIRECTION"                            
## [28] "STREET_TYPE"                                 
## [29] "LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION"
## [30] "LOCATION_CITY"                               
## [31] "LOCATION_STATE"                              
## [32] "LOCATION_LATITUDE"                           
## [33] "LOCATION_LONGITUDE"                          
## [34] "INCIDENT_REASON"                             
## [35] "REASON_FOR_FORCE"                            
## [36] "TYPE_OF_FORCE_USED1"                         
## [37] "TYPE_OF_FORCE_USED2"                         
## [38] "TYPE_OF_FORCE_USED3"                         
## [39] "TYPE_OF_FORCE_USED4"                         
## [40] "TYPE_OF_FORCE_USED5"                         
## [41] "TYPE_OF_FORCE_USED6"                         
## [42] "TYPE_OF_FORCE_USED7"                         
## [43] "TYPE_OF_FORCE_USED8"                         
## [44] "TYPE_OF_FORCE_USED9"                         
## [45] "TYPE_OF_FORCE_USED10"                        
## [46] "NUMBER_EC_CYCLES"                            
## [47] "FORCE_EFFECTIVE"

Checking for duplicates is done to ensure the data we are analyzing is accurate and of high quality. It is observed that the data set has no duplicates.

# Get the logical vector indicating duplicates
duplicates_mask <- duplicated(data)

# Select only the first 100 rows of the logical vector
first_100_duplicates <- head(duplicates_mask, n = 100)

# Print some of the duplicates
print(first_100_duplicates)
##   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [97] FALSE FALSE FALSE FALSE

From the initial exploration it was seen that the first row in the data frame contains the labels of the variables. This is redundant given that the variables already have labels. It also means that the labels are counted as observations. The number of observations in the data frame has now reduced to 2383.

Data Cleaning

data <- data[2:nrow(data), ] 
dim(data)
## [1] 2383   47

The next step involves identifying the type of variables in the data frame this is important as it will help in selecting the appropriate data cleaning and analysis techniques. It will also help in the identification of potential errors that could prevent a consistent analysis from being done. This step is crucial in ensuring that our findings are communicated effectively.

It is observed that all variables have been classified as characters despite some being made to represent dates and numbers.

sapply(data, function(x) is.character(x))
##                                INCIDENT_DATE 
##                                         TRUE 
##                                INCIDENT_TIME 
##                                         TRUE 
##                                   UOF_NUMBER 
##                                         TRUE 
##                                   OFFICER_ID 
##                                         TRUE 
##                               OFFICER_GENDER 
##                                         TRUE 
##                                 OFFICER_RACE 
##                                         TRUE 
##                            OFFICER_HIRE_DATE 
##                                         TRUE 
##                       OFFICER_YEARS_ON_FORCE 
##                                         TRUE 
##                               OFFICER_INJURY 
##                                         TRUE 
##                          OFFICER_INJURY_TYPE 
##                                         TRUE 
##                      OFFICER_HOSPITALIZATION 
##                                         TRUE 
##                                   SUBJECT_ID 
##                                         TRUE 
##                                 SUBJECT_RACE 
##                                         TRUE 
##                               SUBJECT_GENDER 
##                                         TRUE 
##                               SUBJECT_INJURY 
##                                         TRUE 
##                          SUBJECT_INJURY_TYPE 
##                                         TRUE 
##                         SUBJECT_WAS_ARRESTED 
##                                         TRUE 
##                          SUBJECT_DESCRIPTION 
##                                         TRUE 
##                              SUBJECT_OFFENSE 
##                                         TRUE 
##                               REPORTING_AREA 
##                                         TRUE 
##                                         BEAT 
##                                         TRUE 
##                                       SECTOR 
##                                         TRUE 
##                                     DIVISION 
##                                         TRUE 
##                            LOCATION_DISTRICT 
##                                         TRUE 
##                                STREET_NUMBER 
##                                         TRUE 
##                                  STREET_NAME 
##                                         TRUE 
##                             STREET_DIRECTION 
##                                         TRUE 
##                                  STREET_TYPE 
##                                         TRUE 
## LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION 
##                                         TRUE 
##                                LOCATION_CITY 
##                                         TRUE 
##                               LOCATION_STATE 
##                                         TRUE 
##                            LOCATION_LATITUDE 
##                                         TRUE 
##                           LOCATION_LONGITUDE 
##                                         TRUE 
##                              INCIDENT_REASON 
##                                         TRUE 
##                             REASON_FOR_FORCE 
##                                         TRUE 
##                          TYPE_OF_FORCE_USED1 
##                                         TRUE 
##                          TYPE_OF_FORCE_USED2 
##                                         TRUE 
##                          TYPE_OF_FORCE_USED3 
##                                         TRUE 
##                          TYPE_OF_FORCE_USED4 
##                                         TRUE 
##                          TYPE_OF_FORCE_USED5 
##                                         TRUE 
##                          TYPE_OF_FORCE_USED6 
##                                         TRUE 
##                          TYPE_OF_FORCE_USED7 
##                                         TRUE 
##                          TYPE_OF_FORCE_USED8 
##                                         TRUE 
##                          TYPE_OF_FORCE_USED9 
##                                         TRUE 
##                         TYPE_OF_FORCE_USED10 
##                                         TRUE 
##                             NUMBER_EC_CYCLES 
##                                         TRUE 
##                              FORCE_EFFECTIVE 
##                                         TRUE
sapply(data, function(x) is.numeric(x)) # it seems all the variables are characters 
##                                INCIDENT_DATE 
##                                        FALSE 
##                                INCIDENT_TIME 
##                                        FALSE 
##                                   UOF_NUMBER 
##                                        FALSE 
##                                   OFFICER_ID 
##                                        FALSE 
##                               OFFICER_GENDER 
##                                        FALSE 
##                                 OFFICER_RACE 
##                                        FALSE 
##                            OFFICER_HIRE_DATE 
##                                        FALSE 
##                       OFFICER_YEARS_ON_FORCE 
##                                        FALSE 
##                               OFFICER_INJURY 
##                                        FALSE 
##                          OFFICER_INJURY_TYPE 
##                                        FALSE 
##                      OFFICER_HOSPITALIZATION 
##                                        FALSE 
##                                   SUBJECT_ID 
##                                        FALSE 
##                                 SUBJECT_RACE 
##                                        FALSE 
##                               SUBJECT_GENDER 
##                                        FALSE 
##                               SUBJECT_INJURY 
##                                        FALSE 
##                          SUBJECT_INJURY_TYPE 
##                                        FALSE 
##                         SUBJECT_WAS_ARRESTED 
##                                        FALSE 
##                          SUBJECT_DESCRIPTION 
##                                        FALSE 
##                              SUBJECT_OFFENSE 
##                                        FALSE 
##                               REPORTING_AREA 
##                                        FALSE 
##                                         BEAT 
##                                        FALSE 
##                                       SECTOR 
##                                        FALSE 
##                                     DIVISION 
##                                        FALSE 
##                            LOCATION_DISTRICT 
##                                        FALSE 
##                                STREET_NUMBER 
##                                        FALSE 
##                                  STREET_NAME 
##                                        FALSE 
##                             STREET_DIRECTION 
##                                        FALSE 
##                                  STREET_TYPE 
##                                        FALSE 
## LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION 
##                                        FALSE 
##                                LOCATION_CITY 
##                                        FALSE 
##                               LOCATION_STATE 
##                                        FALSE 
##                            LOCATION_LATITUDE 
##                                        FALSE 
##                           LOCATION_LONGITUDE 
##                                        FALSE 
##                              INCIDENT_REASON 
##                                        FALSE 
##                             REASON_FOR_FORCE 
##                                        FALSE 
##                          TYPE_OF_FORCE_USED1 
##                                        FALSE 
##                          TYPE_OF_FORCE_USED2 
##                                        FALSE 
##                          TYPE_OF_FORCE_USED3 
##                                        FALSE 
##                          TYPE_OF_FORCE_USED4 
##                                        FALSE 
##                          TYPE_OF_FORCE_USED5 
##                                        FALSE 
##                          TYPE_OF_FORCE_USED6 
##                                        FALSE 
##                          TYPE_OF_FORCE_USED7 
##                                        FALSE 
##                          TYPE_OF_FORCE_USED8 
##                                        FALSE 
##                          TYPE_OF_FORCE_USED9 
##                                        FALSE 
##                         TYPE_OF_FORCE_USED10 
##                                        FALSE 
##                             NUMBER_EC_CYCLES 
##                                        FALSE 
##                              FORCE_EFFECTIVE 
##                                        FALSE
head(data) # this should not be the case as we can see some dates and numbers 
##   INCIDENT_DATE INCIDENT_TIME    UOF_NUMBER OFFICER_ID OFFICER_GENDER
## 2        9/3/16    4:14:00 AM         37702      10810           Male
## 3       3/22/16   11:00:00 PM         33413       7706           Male
## 4       5/22/16    1:29:00 PM         34567      11014           Male
## 5       1/10/16    8:55:00 PM         31460       6692           Male
## 6       11/8/16    2:30:00 AM  37879, 37898       9844           Male
## 7       9/11/16    7:20:00 PM         36724       9855           Male
##   OFFICER_RACE OFFICER_HIRE_DATE OFFICER_YEARS_ON_FORCE OFFICER_INJURY
## 2        Black            5/7/14                      2             No
## 3        White            1/8/99                     17            Yes
## 4        Black           5/20/15                      1             No
## 5        Black           7/29/91                     24             No
## 6        White           10/4/09                      7             No
## 7        White           6/10/09                      7             No
##            OFFICER_INJURY_TYPE OFFICER_HOSPITALIZATION SUBJECT_ID SUBJECT_RACE
## 2 No injuries noted or visible                      No      46424        Black
## 3                Sprain/Strain                     Yes      44324     Hispanic
## 4 No injuries noted or visible                      No      45126     Hispanic
## 5 No injuries noted or visible                      No      43150     Hispanic
## 6 No injuries noted or visible                      No      47307        Black
## 7 No injuries noted or visible                      No      46549        White
##   SUBJECT_GENDER SUBJECT_INJURY          SUBJECT_INJURY_TYPE
## 2         Female            Yes      Non-Visible Injury/Pain
## 3           Male             No No injuries noted or visible
## 4           Male             No No injuries noted or visible
## 5           Male            Yes               Laceration/Cut
## 6           Male             No No injuries noted or visible
## 7         Female             No No injuries noted or visible
##   SUBJECT_WAS_ARRESTED SUBJECT_DESCRIPTION          SUBJECT_OFFENSE
## 2                  Yes   Mentally unstable                    APOWW
## 3                  Yes   Mentally unstable                    APOWW
## 4                  Yes             Unknown                    APOWW
## 5                  Yes FD-Unknown if Armed           Evading Arrest
## 6                  Yes             Unknown Other Misdemeanor Arrest
## 7                  Yes             Unknown               Assault/FV
##   REPORTING_AREA BEAT SECTOR      DIVISION LOCATION_DISTRICT STREET_NUMBER
## 2           2062  134    130       CENTRAL               D14           211
## 3           1197  237    230     NORTHEAST                D9          7647
## 4           4153  432    430     SOUTHWEST                D6           716
## 5           4523  641    640 NORTH CENTRAL               D11          5600
## 6           2167  346    340     SOUTHEAST                D7          4600
## 7           1134  235    230     NORTHEAST                D9          1234
##    STREET_NAME STREET_DIRECTION STREET_TYPE
## 2        Ervay                N         St.
## 3     Ferguson             NULL         Rd.
## 4 bimebella dr             NULL         Ln.
## 5          LBJ             NULL       Frwy.
## 6    Malcolm X                S       Blvd.
## 7        Peavy             NULL         Rd.
##   LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION LOCATION_CITY LOCATION_STATE
## 2                               211 N ERVAY ST        Dallas             TX
## 3                             7647 FERGUSON RD        Dallas             TX
## 4                             716 BIMEBELLA LN        Dallas             TX
## 5                               5600 L B J FWY        Dallas             TX
## 6                        4600 S MALCOLM X BLVD        Dallas             TX
## 7                                1234 PEAVY RD        Dallas             TX
##   LOCATION_LATITUDE LOCATION_LONGITUDE INCIDENT_REASON REASON_FOR_FORCE
## 2         32.782205         -96.797461          Arrest           Arrest
## 3         32.798978         -96.717493          Arrest           Arrest
## 4          32.73971          -96.92519          Arrest           Arrest
## 5                                               Arrest           Arrest
## 6                                               Arrest           Arrest
## 7         32.837527         -96.695566          Arrest           Arrest
##      TYPE_OF_FORCE_USED1 TYPE_OF_FORCE_USED2 TYPE_OF_FORCE_USED3
## 2  Hand/Arm/Elbow Strike                                        
## 3            Joint Locks                                        
## 4      Take Down - Group                                        
## 5         K-9 Deployment                                        
## 6         Verbal Command     Take Down - Arm                    
## 7 Hand Controlled Escort                                        
##   TYPE_OF_FORCE_USED4 TYPE_OF_FORCE_USED5 TYPE_OF_FORCE_USED6
## 2                                                            
## 3                                                            
## 4                                                            
## 5                                                            
## 6                                                            
## 7                                                            
##   TYPE_OF_FORCE_USED7 TYPE_OF_FORCE_USED8 TYPE_OF_FORCE_USED9
## 2                                                            
## 3                                                            
## 4                                                            
## 5                                                            
## 6                                                            
## 7                                                            
##   TYPE_OF_FORCE_USED10 NUMBER_EC_CYCLES FORCE_EFFECTIVE
## 2                                  NULL             Yes
## 3                                  NULL             Yes
## 4                                  NULL             Yes
## 5                                  NULL             Yes
## 6                                  NULL         No, Yes
## 7                                  NULL             Yes

Further transformations of need to be done to ensure the appropriate analysis techniques are done. Numerical, categorical and time series data can not be analysed in the same way. Obtaining the names of the variables gives an understanding of what they mean and how they should be transformed if they are required to.

names(data)  
##  [1] "INCIDENT_DATE"                               
##  [2] "INCIDENT_TIME"                               
##  [3] "UOF_NUMBER"                                  
##  [4] "OFFICER_ID"                                  
##  [5] "OFFICER_GENDER"                              
##  [6] "OFFICER_RACE"                                
##  [7] "OFFICER_HIRE_DATE"                           
##  [8] "OFFICER_YEARS_ON_FORCE"                      
##  [9] "OFFICER_INJURY"                              
## [10] "OFFICER_INJURY_TYPE"                         
## [11] "OFFICER_HOSPITALIZATION"                     
## [12] "SUBJECT_ID"                                  
## [13] "SUBJECT_RACE"                                
## [14] "SUBJECT_GENDER"                              
## [15] "SUBJECT_INJURY"                              
## [16] "SUBJECT_INJURY_TYPE"                         
## [17] "SUBJECT_WAS_ARRESTED"                        
## [18] "SUBJECT_DESCRIPTION"                         
## [19] "SUBJECT_OFFENSE"                             
## [20] "REPORTING_AREA"                              
## [21] "BEAT"                                        
## [22] "SECTOR"                                      
## [23] "DIVISION"                                    
## [24] "LOCATION_DISTRICT"                           
## [25] "STREET_NUMBER"                               
## [26] "STREET_NAME"                                 
## [27] "STREET_DIRECTION"                            
## [28] "STREET_TYPE"                                 
## [29] "LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION"
## [30] "LOCATION_CITY"                               
## [31] "LOCATION_STATE"                              
## [32] "LOCATION_LATITUDE"                           
## [33] "LOCATION_LONGITUDE"                          
## [34] "INCIDENT_REASON"                             
## [35] "REASON_FOR_FORCE"                            
## [36] "TYPE_OF_FORCE_USED1"                         
## [37] "TYPE_OF_FORCE_USED2"                         
## [38] "TYPE_OF_FORCE_USED3"                         
## [39] "TYPE_OF_FORCE_USED4"                         
## [40] "TYPE_OF_FORCE_USED5"                         
## [41] "TYPE_OF_FORCE_USED6"                         
## [42] "TYPE_OF_FORCE_USED7"                         
## [43] "TYPE_OF_FORCE_USED8"                         
## [44] "TYPE_OF_FORCE_USED9"                         
## [45] "TYPE_OF_FORCE_USED10"                        
## [46] "NUMBER_EC_CYCLES"                            
## [47] "FORCE_EFFECTIVE"

There are two variables that represent the moment and incident occurred. They are in the form of date and time.These variables are combined as they would help in looking for patterns in a data set over time a given. Combining these variables would also help in the creation of more appealing visuals. The INCIDENT_TIME variable can be discarded after this transformation. It is worth noting that this variable is still classified as a character.

# 
data$INCIDENT_DATE_AND_TIME <- paste(data$INCIDENT_DATE, data$INCIDENT_TIME)
head(data$INCIDENT_DATE_AND_TIME) 
## [1] "9/3/16 4:14:00 AM"   "3/22/16 11:00:00 PM" "5/22/16 1:29:00 PM" 
## [4] "1/10/16 8:55:00 PM"  "11/8/16 2:30:00 AM"  "9/11/16 7:20:00 PM"
class(data$INCIDENT_DATE_AND_TIME ) # gives type of a variable
## [1] "character"
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
data <- select(data, -INCIDENT_TIME)
dim(data)#data frame should have 47 variables
## [1] 2383   47

The conversion of variables to their right form starts with those that represent time and date.The variable type used to represent time is in the “POSIXIt” class. The variable type used to represent the date is in the “Date” class.

The INCIDENT_DATE_AND_TIME variable is transformed to the “POSIXlt” “POSIXt” class. Whereas the OFFICER_HIRE_DATE variable is transformed into the “Date” class.

data$INCIDENT_DATE_AND_TIME  <- strptime(data$INCIDENT_DATE_AND_TIME, format = "%m/%d/%y %I:%M:%S %p")
class(data$INCIDENT_DATE_AND_TIME)#ensure that it's in the right form
## [1] "POSIXlt" "POSIXt"
data$OFFICER_HIRE_DATE <- as.Date(data$OFFICER_HIRE_DATE, format = "%m/%d/%Y",  na.strings = "NA")
class(data$OFFICER_HIRE_DATE)#ensure that it's in the right form
## [1] "Date"
data$INCIDENT_DATE <- as.Date(data$INCIDENT_DATE, format = "%m/%d/%Y", na.strings = "NA")
class(data$INCIDENT_DATE)#ensure that it's in the right form
## [1] "Date"

The following variables are transformed to numeric form

  • UOF_NUMBER

  • OFFICER_ID

  • OFFICER_YEARS_ON_FORCE

  • SUBJECT_ID

  • REPORTING_AREA

  • BEAT

  • SECTOR

  • STREET_NUMBER

  • LOCATION_LATITUDE

  • LOCATION_LONGITUDE

data$UOF_NUMBER<- as.numeric(as.character(data$UOF_NUMBER), na.rm = TRUE)
## Warning: NAs introduced by coercion
class(data$UOF_NUMBER) # check if the variable is numeric
## [1] "numeric"
data$OFFICER_ID<- as.numeric(as.character(data$OFFICER_ID), na.rm = TRUE)
class(data$OFFICER_ID) # check if the variable is numeric
## [1] "numeric"
data$OFFICER_YEARS_ON_FORCE <- as.numeric(as.character(data$OFFICER_YEARS_ON_FORCE), na.rm = TRUE)
class(data$OFFICER_YEARS_ON_FORCE) # check if the variable is numeric
## [1] "numeric"
data$SUBJECT_ID <- as.numeric(as.character(data$SUBJECT_ID), na.rm = TRUE)
class (data$SUBJECT_ID) # check if the variable is numeric
## [1] "numeric"
data$REPORTING_AREA <- as.numeric(as.character(data$REPORTING_AREA), na.rm = TRUE)
class(data$REPORTING_AREA) # check if the variable is numeric
## [1] "numeric"
data$BEAT <- as.numeric(as.character(data$BEAT), na.rm = TRUE)
class(data$BEAT) # check if the variable is numeric
## [1] "numeric"
data$SECTOR <- as.numeric(as.character(data$SECTOR), na.rm = TRUE)
class(data$SECTOR) # check if the variable is numeric
## [1] "numeric"
data$STREET_NUMBER <- as.numeric(as.character(data$STREET_NUMBER), na.rm = TRUE)
class(data$STREET_NUMBER) # check if the variable is numeric
## [1] "numeric"
data$LOCATION_LATITUDE <- as.numeric(as.character(data$LOCATION_LATITUDE), na.rm = TRUE)
class(data$LOCATION_LATITUDE) # check if the variable is numeric
## [1] "numeric"
data$LOCATION_LONGITUDE <- as.numeric(as.character(data$LOCATION_LONGITUDE), na.rm = TRUE)
class(data$LOCATION_LONGITUDE)# check if the variable is numeric
## [1] "numeric"

The variable NUMBER_EC_CYCLES could refer to the number of electronic control cycles used in the incident. This explains why it consists of both numerical and categorical inputs. Electronic control devices could include Tasers which are known to use electrical current to immobilize a subject. The number of cycles used may provide insight into the level of force used and could be relevant to an investigation of the incident. It also has more than one numerical input in some entries. This variable was excluded from the transformation as some information would have been lost if it were transformed. It remained with the classification of category.

unique(data$NUMBER_EC_CYCLES)
##  [1] "NULL"  "1"     "3"     "2"     "4"     " 2, 4" "5"     "0"     " 1, 1"
## [10] " 3, 2" " 3, 3" "6"
class(data$NUMBER_EC_CYCLES)
## [1] "character"

The transformation of the variables to the right form was done successfully as shown below.

str(data)
## 'data.frame':    2383 obs. of  47 variables:
##  $ INCIDENT_DATE                               : Date, format: "0016-09-03" "0016-03-22" ...
##  $ UOF_NUMBER                                  : num  37702 33413 34567 31460 NA ...
##  $ OFFICER_ID                                  : num  10810 7706 11014 6692 9844 ...
##  $ OFFICER_GENDER                              : chr  "Male" "Male" "Male" "Male" ...
##  $ OFFICER_RACE                                : chr  "Black" "White" "Black" "Black" ...
##  $ OFFICER_HIRE_DATE                           : Date, format: "0014-05-07" "0099-01-08" ...
##  $ OFFICER_YEARS_ON_FORCE                      : num  2 17 1 24 7 7 7 9 4 8 ...
##  $ OFFICER_INJURY                              : chr  "No" "Yes" "No" "No" ...
##  $ OFFICER_INJURY_TYPE                         : chr  "No injuries noted or visible" "Sprain/Strain" "No injuries noted or visible" "No injuries noted or visible" ...
##  $ OFFICER_HOSPITALIZATION                     : chr  "No" "Yes" "No" "No" ...
##  $ SUBJECT_ID                                  : num  46424 44324 45126 43150 47307 ...
##  $ SUBJECT_RACE                                : chr  "Black" "Hispanic" "Hispanic" "Hispanic" ...
##  $ SUBJECT_GENDER                              : chr  "Female" "Male" "Male" "Male" ...
##  $ SUBJECT_INJURY                              : chr  "Yes" "No" "No" "Yes" ...
##  $ SUBJECT_INJURY_TYPE                         : chr  "Non-Visible Injury/Pain" "No injuries noted or visible" "No injuries noted or visible" "Laceration/Cut" ...
##  $ SUBJECT_WAS_ARRESTED                        : chr  "Yes" "Yes" "Yes" "Yes" ...
##  $ SUBJECT_DESCRIPTION                         : chr  "Mentally unstable" "Mentally unstable" "Unknown" "FD-Unknown if Armed" ...
##  $ SUBJECT_OFFENSE                             : chr  "APOWW" "APOWW" "APOWW" "Evading Arrest" ...
##  $ REPORTING_AREA                              : num  2062 1197 4153 4523 2167 ...
##  $ BEAT                                        : num  134 237 432 641 346 235 132 515 133 614 ...
##  $ SECTOR                                      : num  130 230 430 640 340 230 130 510 130 610 ...
##  $ DIVISION                                    : chr  "CENTRAL" "NORTHEAST" "SOUTHWEST" "NORTH CENTRAL" ...
##  $ LOCATION_DISTRICT                           : chr  "D14" "D9" "D6" "D11" ...
##  $ STREET_NUMBER                               : num  211 7647 716 5600 4600 ...
##  $ STREET_NAME                                 : chr  "Ervay" "Ferguson" "bimebella dr" "LBJ" ...
##  $ STREET_DIRECTION                            : chr  "N" "NULL" "NULL" "NULL" ...
##  $ STREET_TYPE                                 : chr  "St." "Rd." "Ln." "Frwy." ...
##  $ LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION: chr  "211 N ERVAY ST" "7647 FERGUSON RD" "716 BIMEBELLA LN" "5600 L B J FWY" ...
##  $ LOCATION_CITY                               : chr  "Dallas" "Dallas" "Dallas" "Dallas" ...
##  $ LOCATION_STATE                              : chr  "TX" "TX" "TX" "TX" ...
##  $ LOCATION_LATITUDE                           : num  32.8 32.8 32.7 NA NA ...
##  $ LOCATION_LONGITUDE                          : num  -96.8 -96.7 -96.9 NA NA ...
##  $ INCIDENT_REASON                             : chr  "Arrest" "Arrest" "Arrest" "Arrest" ...
##  $ REASON_FOR_FORCE                            : chr  "Arrest" "Arrest" "Arrest" "Arrest" ...
##  $ TYPE_OF_FORCE_USED1                         : chr  "Hand/Arm/Elbow Strike" "Joint Locks" "Take Down - Group" "K-9 Deployment" ...
##  $ TYPE_OF_FORCE_USED2                         : chr  "" "" "" "" ...
##  $ TYPE_OF_FORCE_USED3                         : chr  "" "" "" "" ...
##  $ TYPE_OF_FORCE_USED4                         : chr  "" "" "" "" ...
##  $ TYPE_OF_FORCE_USED5                         : chr  "" "" "" "" ...
##  $ TYPE_OF_FORCE_USED6                         : chr  "" "" "" "" ...
##  $ TYPE_OF_FORCE_USED7                         : chr  "" "" "" "" ...
##  $ TYPE_OF_FORCE_USED8                         : chr  "" "" "" "" ...
##  $ TYPE_OF_FORCE_USED9                         : chr  "" "" "" "" ...
##  $ TYPE_OF_FORCE_USED10                        : chr  "" "" "" "" ...
##  $ NUMBER_EC_CYCLES                            : chr  "NULL" "NULL" "NULL" "NULL" ...
##  $ FORCE_EFFECTIVE                             : chr  " Yes" " Yes" " Yes" " Yes" ...
##  $ INCIDENT_DATE_AND_TIME                      : POSIXlt, format: "2016-09-03 04:14:00" "2016-03-22 23:00:00" ...

The number of missing values can affect the choice of data analysis techniques. Knowing the number of missing values can help to determine which analysis techniques are appropriate and ensure that the results are valid.

It is observed that there are 1756 missing variables. From this observation, it is seen that only 4 variables had missing inputs. This is observation does not act as a huge barrier for us to conduct a conclusive analysis. Further discussion and exploration on what the missing variables is done in the next section.

sum(is.na(data)) # there are 1756 missing values in the data frame
## [1] 1756
sapply(data, function(x) sum(is.na(x)))#No. of missing values in each column of the data frame
##                                INCIDENT_DATE 
##                                            0 
##                                   UOF_NUMBER 
##                                         1636 
##                                   OFFICER_ID 
##                                            0 
##                               OFFICER_GENDER 
##                                            0 
##                                 OFFICER_RACE 
##                                            0 
##                            OFFICER_HIRE_DATE 
##                                            0 
##                       OFFICER_YEARS_ON_FORCE 
##                                            0 
##                               OFFICER_INJURY 
##                                            0 
##                          OFFICER_INJURY_TYPE 
##                                            0 
##                      OFFICER_HOSPITALIZATION 
##                                            0 
##                                   SUBJECT_ID 
##                                            0 
##                                 SUBJECT_RACE 
##                                            0 
##                               SUBJECT_GENDER 
##                                            0 
##                               SUBJECT_INJURY 
##                                            0 
##                          SUBJECT_INJURY_TYPE 
##                                            0 
##                         SUBJECT_WAS_ARRESTED 
##                                            0 
##                          SUBJECT_DESCRIPTION 
##                                            0 
##                              SUBJECT_OFFENSE 
##                                            0 
##                               REPORTING_AREA 
##                                            0 
##                                         BEAT 
##                                            0 
##                                       SECTOR 
##                                            0 
##                                     DIVISION 
##                                            0 
##                            LOCATION_DISTRICT 
##                                            0 
##                                STREET_NUMBER 
##                                            0 
##                                  STREET_NAME 
##                                            0 
##                             STREET_DIRECTION 
##                                            0 
##                                  STREET_TYPE 
##                                            0 
## LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION 
##                                            0 
##                                LOCATION_CITY 
##                                            0 
##                               LOCATION_STATE 
##                                            0 
##                            LOCATION_LATITUDE 
##                                           55 
##                           LOCATION_LONGITUDE 
##                                           55 
##                              INCIDENT_REASON 
##                                            0 
##                             REASON_FOR_FORCE 
##                                            0 
##                          TYPE_OF_FORCE_USED1 
##                                            0 
##                          TYPE_OF_FORCE_USED2 
##                                            0 
##                          TYPE_OF_FORCE_USED3 
##                                            0 
##                          TYPE_OF_FORCE_USED4 
##                                            0 
##                          TYPE_OF_FORCE_USED5 
##                                            0 
##                          TYPE_OF_FORCE_USED6 
##                                            0 
##                          TYPE_OF_FORCE_USED7 
##                                            0 
##                          TYPE_OF_FORCE_USED8 
##                                            0 
##                          TYPE_OF_FORCE_USED9 
##                                            0 
##                         TYPE_OF_FORCE_USED10 
##                                            0 
##                             NUMBER_EC_CYCLES 
##                                            0 
##                              FORCE_EFFECTIVE 
##                                            0 
##                       INCIDENT_DATE_AND_TIME 
##                                           10

These missing values can be represented visually as shown below:

library(VIM)
## Loading required package: colorspace
## Loading required package: grid
## VIM is ready to use.
## Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues
## 
## Attaching package: 'VIM'
## The following object is masked from 'package:datasets':
## 
##     sleep
aggr_plot <- aggr(data, col=c('navyblue','red'),numbers=TRUE,sortVars=TRUE,
                  labels=names(data),cex.axis=.7,gap=3,
                  ylab=c("Histogram of Missing data","Pattern of Missing data"))  
## Warning in plot.aggr(res, ...): not enough horizontal space to display
## frequencies

## 
##  Variables sorted by number of missings: 
##                                      Variable       Count
##                                    UOF_NUMBER 0.686529585
##                             LOCATION_LATITUDE 0.023080151
##                            LOCATION_LONGITUDE 0.023080151
##                        INCIDENT_DATE_AND_TIME 0.004196391
##                                 INCIDENT_DATE 0.000000000
##                                    OFFICER_ID 0.000000000
##                                OFFICER_GENDER 0.000000000
##                                  OFFICER_RACE 0.000000000
##                             OFFICER_HIRE_DATE 0.000000000
##                        OFFICER_YEARS_ON_FORCE 0.000000000
##                                OFFICER_INJURY 0.000000000
##                           OFFICER_INJURY_TYPE 0.000000000
##                       OFFICER_HOSPITALIZATION 0.000000000
##                                    SUBJECT_ID 0.000000000
##                                  SUBJECT_RACE 0.000000000
##                                SUBJECT_GENDER 0.000000000
##                                SUBJECT_INJURY 0.000000000
##                           SUBJECT_INJURY_TYPE 0.000000000
##                          SUBJECT_WAS_ARRESTED 0.000000000
##                           SUBJECT_DESCRIPTION 0.000000000
##                               SUBJECT_OFFENSE 0.000000000
##                                REPORTING_AREA 0.000000000
##                                          BEAT 0.000000000
##                                        SECTOR 0.000000000
##                                      DIVISION 0.000000000
##                             LOCATION_DISTRICT 0.000000000
##                                 STREET_NUMBER 0.000000000
##                                   STREET_NAME 0.000000000
##                              STREET_DIRECTION 0.000000000
##                                   STREET_TYPE 0.000000000
##  LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION 0.000000000
##                                 LOCATION_CITY 0.000000000
##                                LOCATION_STATE 0.000000000
##                               INCIDENT_REASON 0.000000000
##                              REASON_FOR_FORCE 0.000000000
##                           TYPE_OF_FORCE_USED1 0.000000000
##                           TYPE_OF_FORCE_USED2 0.000000000
##                           TYPE_OF_FORCE_USED3 0.000000000
##                           TYPE_OF_FORCE_USED4 0.000000000
##                           TYPE_OF_FORCE_USED5 0.000000000
##                           TYPE_OF_FORCE_USED6 0.000000000
##                           TYPE_OF_FORCE_USED7 0.000000000
##                           TYPE_OF_FORCE_USED8 0.000000000
##                           TYPE_OF_FORCE_USED9 0.000000000
##                          TYPE_OF_FORCE_USED10 0.000000000
##                              NUMBER_EC_CYCLES 0.000000000
##                               FORCE_EFFECTIVE 0.000000000

The variables in the Policing Data set

The variables in the data set contain inputs of various incidents that were handled by the police in Dallas, Texas in 2016. There were 2383 incidents in that year. The time and place these incidences occurred are given. Various police officers handled the matter and they are represented by the OFFICER_ID variable. These incidences also involved subjects who are represented by a SUBJECT_ID. Characteristics of the officers and subjects involved are given by various variables. Other variables describe the actions and events that took place in a given incident.